New algorithms of the Q-learning type

نویسندگان

  • Shalabh Bhatnagar
  • K. Mohan Babu
چکیده

We propose two algorithms for Q-learning that use the two timescale stochastic approximation methodology. The first of these updates Q-values of all feasible state-action pairs at each instant while the second updates Q-values of states with actions chosen according to the ‘current’ randomized policy updates. A proof of convergence of the algorithms is shown. Finally, numerical experiments using the proposed algorithms on an application of routing in communication networks are presented on a few different settings.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comprehensive Analysis of Dense Point Cloud Filtering Algorithm for Eliminating Non-Ground Features

Point cloud and LiDAR Filtering is removing non-ground features from digital surface model (DSM) and reaching the bare earth and DTM extraction. Various methods have been proposed by different researchers to distinguish between ground and non- ground in points cloud and LiDAR data. Most fully automated methods have a common disadvantage, and they are only effective for a particular type of surf...

متن کامل

Improving the Performance of Machine Learning Algorithms for Heart Disease Diagnosis by Optimizing Data and Features

Heart is one of the most important members of the body, and heart disease is the major cause of death in the world and Iran. This is why the early/on time diagnosis is one of the significant basics for preventing and reducing deaths of this disease. So far, many studies have been done on heart disease with the aim of prediction, diagnosis, and treatment. However, most of them have been mostly f...

متن کامل

Machine learning algorithms in air quality modeling

Modern studies in the field of environment science and engineering show that deterministic models struggle to capture the relationship between the concentration of atmospheric pollutants and their emission sources. The recent advances in statistical modeling based on machine learning approaches have emerged as solution to tackle these issues. It is a fact that, input variable type largely affec...

متن کامل

Evaluating project’s completion time with Q-learning

Nowadays project management is a key component in introductory operations management. The educators and the researchers in these areas advocate representing a project as a network and applying the solution approaches for network models to them to assist project managers to monitor their completion. In this paper, we evaluated project’s completion time utilizing the Q-learning algorithm. So the ...

متن کامل

A Discrete Hybrid Teaching-Learning-Based Optimization algorithm for optimization of space trusses

In this study, to enhance the optimization process, especially in the structural engineering field two well-known algorithms are merged together in order to achieve an improved hybrid algorithm. These two algorithms are Teaching-Learning Based Optimization (TLBO) and Harmony Search (HS) which have been used by most researchers in varied fields of science. The hybridized algorithm is called A Di...

متن کامل

P14: Anxiety Control Using Q-Learning

Anxiety disorders are the most common reasons for referring to specialized clinics. If the response to stress changed, anxiety can be greatly controlled. The most obvious effect of stress occurs on circulatory system especially through sweating. the electrical conductivity of skin or in other words Galvanic Skin Response (GSR) which is dependent on stress level is used; beside this parameter pe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Automatica

دوره 44  شماره 

صفحات  -

تاریخ انتشار 2008